Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace. Several families of communication-reduction methods, such as quantization, largebatch methods, and gradient sparsification, have been proposed. To date, gradient sparsification methods-where each node sorts gradients by magnitude, and only communicates a subset of the components, accumulating the rest locally-are known to yield some of the largest practical gains. Such methods can reduce the amount of communication per step by up to three orders of magnitude, while preserving model accuracy. Yet, this family of methods currently has no theoretical justification. This is the question we address in this paper. We prove that, under analytic assumptions, sparsifying gradients by magnitude with local error correction provides convergence guarantees, for both convex and non-convex smooth objectives, for data-parallel SGD. The main insight is that sparsification methods implicitly maintain bounds on the maximum impact of stale updates, thanks to selection by magnitude. Our analysis and empirical validation also reveal that these methods do require analytical conditions to converge well, justifying existing heuristics.
translated by 谷歌翻译
Applications such as employees sharing office spaces over a workweek can be modeled as problems where agents are matched to resources over multiple rounds. Agents' requirements limit the set of compatible resources and the rounds in which they want to be matched. Viewing such an application as a multi-round matching problem on a bipartite compatibility graph between agents and resources, we show that a solution (i.e., a set of matchings, with one matching per round) can be found efficiently if one exists. To cope with situations where a solution does not exist, we consider two extensions. In the first extension, a benefit function is defined for each agent and the objective is to find a multi-round matching to maximize the total benefit. For a general class of benefit functions satisfying certain properties (including diminishing returns), we show that this multi-round matching problem is efficiently solvable. This class includes utilitarian and Rawlsian welfare functions. For another benefit function, we show that the maximization problem is NP-hard. In the second extension, the objective is to generate advice to each agent (i.e., a subset of requirements to be relaxed) subject to a budget constraint so that the agent can be matched. We show that this budget-constrained advice generation problem is NP-hard. For this problem, we develop an integer linear programming formulation as well as a heuristic based on local search. We experimentally evaluate our algorithms on synthetic networks and apply them to two real-world situations: shared office spaces and matching courses to classrooms.
translated by 谷歌翻译
将人类运营商和虚拟代理(机器人)相结合到有效的混合系统中的前景是为客户提供适当的客户服务的前景,这是有希望而又具有挑战性的。当机器人无法提供适当的服务并在他们喜欢与人类运营商互动时,混合系统会减少客户的挫败感。此外,我们表明,可以通过使虚拟代理能够向人类操作员逐步学习来降低建立和维护此类虚拟代理的成本和努力。我们采用排队理论来确定控制此类混合系统行为和效率的关键参数,并确定应优化应进行优化以改善服务的主要参数。我们正式证明并在广泛的模拟和用户研究中证明,有了适当的选择,这种混合系统能够增加服务客户的数量,同时减少他们的预期等待时间和增加满意度。
translated by 谷歌翻译
许多情况下,具有限制代理商竞争资源的代理商可以作为两分图上的最大匹配问题施放。我们的重点是资源分配问题,在这些问题上,代理可能会限制与某些资源不兼容的限制。我们假设一个原理可以随机选择最大匹配,以便每个代理都具有一定概率的资源。代理商希望通过在一定范围内修改限制来提高他们的匹配机会。原则的目标是建议一个不满意的代理商放松其限制,以便放松的总成本在预算范围内(代理商选择),并最大程度地提高了分配资源的可能性。我们为这种预算受限的最大化问题的某些变体建立硬度结果,并为其他变体提供算法结果。我们通过实验评估合成数据集以及两个新颖的现实数据集:度假活动数据集和一个教室数据集的方法。
translated by 谷歌翻译